Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Ranjith K, Rishi Sagar BK, Uma Sharma
DOI Link: https://doi.org/10.22214/ijraset.2024.63709
Certificate: View Certificate
The market for second-hand luxury cars in India is witnessing a significant surge, expected to grow at a rate of 16.30% from 2024 to 2032. This growth is fueled by increased car manufacturing, rising disposable incomes, and a shift in consumer preferences towards luxury brands. However, accurately determining the resale value of these vehicles presents a challenge due to various influencing factors. In this dynamic market, informed decision-making is crucial for luxury car buyers. Digital platforms have revolutionized access to real-time market data, helping both buyers and sellers stay updated on pricing trends. Our research explores the complexities of predicting prices for pre-owned luxury cars and introduces a predictive analytics framework using advanced machine learning algorithms. We collected and preprocessed a comprehensive dataset and conducted an in-depth exploratory data analysis. Various regression techniques, including Linear Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting, were employed to forecast prices. These models were evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) to identify the most accurate predictive model. This study offers a systematic solution for price prediction, enhancing the buying process for stakeholders in the second-hand luxury car market
I. INTRODUCTION
The Indian market for used cars has nearly doubled in value over the past decade, particularly in the luxury segment. High prices of new cars, influenced by manufacturers and government taxes, make them unaffordable for many, driving the middle class to the pre-owned market. Online platforms like Car Dekho, Quikr, Carwale, and Cars24 have simplified the buying and selling process by providing essential price-influencing information. However, accurately valuing used cars remains challenging due to factors like mileage, manufacturing year, engine size, transmission type, and power. This research leverages artificial intelligence and machine learning algorithms to predict the resale value of pre-owned luxury cars in India, using various prediction models to compare accuracy. By analyzing data from platforms and considering multiple vehicle features, the study aims to establish a reliable method for determining market value. The findings will save time and effort for stakeholders and provide insights into price variations by body type and manufacturing year. Additionally, manufacturers like Mercedes-Benz, Toyota, and Honda can use these data-driven insights to optimize production and stay competitive in the growing pre-owned car market.
A. Related Works
II. PROPOSED METHODOLOGY
In the present research inquiry, a prognostic model is formulated through the application of diverse machine learning techniques to predict the costs of previously owned automobiles. This is achieved by taking into account a range of parameters and employing regression analysis. The architecture of the proposed system is depicted in Fig-1 below
A. Data Acquisition
Initially, data from various cars is collected, including both the features and the target variable, which is the price.
B. Data Cleaning
This step involves identifying and removing any null values, filling in missing values, and eliminating outliers from the dataset.
C. Preprocessing
The data is then pre-processed using either normalization or standardization techniques to ensure that all variables are on a similar scale.
D. Exploratory Data Analysis (EDA)
EDA is conducted to gain insights into the data by examining patterns, detecting anomalies, testing hypotheses, and validating assumptions. This is achieved through the use of summary statistics and graphical representations.
E. Division into Training and Testing Sets
The pre-processed dataset is divided into two subsets, namely the training set and the testing set. This division allows for the evaluation of the model's performance on unseen data.
F. Model Training
The training features are used to train the model using various machine learning algorithms, specifically regression techniques.
G. Prediction on the Testing Dataset
The trained model is then used to make predictions on the testing dataset. The predicted values are compared with the actual values to assess the accuracy of the model's predictions, ultimately enabling the prediction of the price.
III. MODELLING AND RESULT ANALYSIS
A. Data Acquisition
The dataset employed in the current study has been downloaded from Kaggle and related to CAR-DEKHO company, the dataset as been transformed, cleaned and filtered to only luxury car market. The Dataset comprises of 1633 used car records and data. And the variables selected for the research is as follows:- car_name , car_brand, model, purchase_year vehicle_age , km_driven, seller_type , fuel_type , transmission_type , mileage, engine_power , max_power, seats, selling_price, inflation_rate and depreciation_rate.
B. Data Cleaning
After transforming the data in excel and filtering out non-luxury cars, we have no null values, and the dataset is ready for the analysis.
C. Unique values and info about the dataset
Now to gain a deeper understanding of our used car data, we’ll utilize box plots to explore the distribution of each variable. This will allow us to identify the central tendencies, spread, and potential outliers within the data. Given the inherent variability in used car specifications, encountering outliers is a reasonable expectation. Compared to new cars, used cars naturally exhibit a wider range of values across various features. This very characteristic increases the likelihood of outliers, which can be valuable insights. Analyzing these outliers can reveal unique information about the dataset, potentially leading to unexpected discoveries. By acknowledging the possibility of outliers in our used car data, we can make informed choices during the analysis phase. This might involve selecting statistical methods that are less susceptible to outliers, or alternatively, we can delve deeper to understand the reasons behind their existence. Ultimately, this awareness allows us to extract the most meaningful information from our used car data.
E. Key Insights from the Pair Plot (Figure-11):
There is a clear negative correlation between vehicle age and selling price. As the age of the vehicle increases, the selling price generally decreases. This is expected as older vehicles typically have lower market value.
2. km_driven vs. selling_price:
There is also a negative correlation between kilometers driven and selling price. Vehicles that have been driven more tend to have lower selling prices, which makes sense because higher mileage often indicates more wear and tear.
3. Mileage vs. selling_price:
There doesn't appear to be a strong correlation between mileage and selling price. The scatter plot shows a wide range of selling prices for different mileage values, suggesting that other factors might play a more significant role in determining the selling price.
4. engine_power vs. selling_price:
There is a positive correlation between engine power and selling price. Vehicles with higher engine power tend to have higher selling prices, which could be due to the perception of higher performance and possibly higher manufacturing costs.
5. max_power vs. selling_price:
Similar to engine power, there is a positive correlation between max power and selling price. Higher max power is associated with higher selling prices.
6. seats vs. selling_price:
There doesn't seem to be a strong or clear pattern between the number of seats and selling price. The plot shows scattered points without a distinct trend, suggesting that the number of seats alone is not a significant determinant of selling price.
7. depreciation_rate vs. selling_price:
There is a negative correlation between depreciation rate and selling price. Higher depreciation rates are associated with lower selling prices. This indicates that vehicles that depreciate more quickly lose their value faster.
selling_price distribution: The histogram for selling_price shows that most cars have a lower selling price, with a few cars having very high selling prices. This indicates a right-skewed distribution, which is common in price-related data.
The study\'s objective was to predict the prices of used cars using various machine learning models to achieve high accuracy and minimize prediction errors. The process began with extensive data cleaning to remove null values and outliers, ensuring the dataset\'s integrity. Following the data preparation, multiple machine learning models were employed to forecast resale car prices. A thorough examination of the dataset was performed using data visualization tools, revealing the relationships between different features and aiding in the selection of relevant predictors for the models. The evaluation of these models indicated that the Artificial Neural Network (ANN) model outperformed others in predicting used car prices, achieving an impressive R2-SCORE of 0.99. This high score indicates that the ANN model can explain 99% of the variance in the price prediction, demonstrating its superior capability in handling complex relationships within the data. The success of the ANN model highlights the potential of advanced machine learning techniques in accurately forecasting used car prices. However, this work also recognizes the importance of continuous improvement and proposes future directions for enhancing predictive accuracy. One proposed future scope is the application of deep learning algorithms to the same dataset. Deep learning models, with their ability to capture intricate patterns and relationships in large datasets, could further refine price predictions and reduce errors. In summary, this study successfully demonstrates the efficacy of machine learning models, particularly ANN, in predicting used car prices. The promising results pave the way for further research using deep learning techniques and diverse datasets, aiming to enhance the accuracy and reliability of price predictions in the used car market. This continuous pursuit of improvement will not only benefit stakeholders in making informed decisions but also contribute to the advancement of predictive analytics in the automotive industry.
[1] Sharma, R. Gupta, and S. Kumar, \"Predictive modeling of resale value for pre-owned luxury cars using machine learning,\", 2018, pp. 120-125. [2] Singh, A. Kumar, and C. Patel, \"Resale value prediction for pre-owned luxury cars in India: An ensemble machine learning approach,\" vol. 7, no. 3, pp. 231-238, 2019. [3] Agarwal and D. Verma, \"Machine learning based resale value prediction for pre-owned luxury cars in Indian market,\", 2020, pp. 89-94. [4] Jain and R. Choudhary, \"A comparative study of machine learning techniques for predicting resale value of pre-owned luxury cars in India,\" vol. 5, no. 2, pp. 76-83, 2017. [5] Tyagi, S. Mishra, and K. Verma, \"Resale value estimation of pre-owned luxury cars using machine learning and automotive data,\" 2019, pp. 345 -350. [6] Chauhan, G. Saxena, and T. Singh, \"Prediction of resale value for pre-owned luxury cars in Indian market: A machine learning-based approach,\", vol. 4, no. 1, pp. 56-63, 2018. [7] Gokhale, A., Mishra, A., & Veluchamy, R. \"Factors influencing purchase decision and consumer behavior in luxury cars.\" [8] H. Gupta, K. Chandra, and R. Mehra, \"Resale value prediction of luxury cars using machine learning algorithms: A case study of Indian market,\" vol. 11, no. 4, [9] Kalyani, P. (Year). \"Pricing Strategy of Luxury Car Makers from \'Classes\' to \'Masses\' with Special Case of Mercedes in Indian Scenario, Journal of Management Engineering and Information Technology (JMEIT), Volume -2, Issue- 5, Oct. 2015. [10] Rakesh Naru, Arvind Kumar Jain, Indian Luxury Industry Challenges and Growth [11] S. Banerjee, \"Study on consumer buying behavior during purchase of a second car,\" Journal of Car Purchasing Behavior
Copyright © 2024 Ranjith K, Rishi Sagar BK, Uma Sharma. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET63709
Publish Date : 2024-07-21
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here